Approximate Nearest Neighbors Algorithm - a short version
نویسندگان
چکیده
dimensional Euclidean space. Given N points {xj} in Rd, the algorithm attempts to find k nearest neighbors for each of xj , where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d ·(log d)+ k · (d + log k) · (log N)) + N · k2 · (d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {xj} for an arbitrary point x ∈ Rd. The cost of each such query is proportional to T ·(d · (log d) + log(N/k) · k · (d + log k)), and the memory requirements for the requisite data structure are of the order N · (d + k) + T · (d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions of {xj}, and illustrate its performance via several numerical examples.
منابع مشابه
A Randomized Approximate Nearest Neighbors Algorithm -a Short Version
dimensional Euclidean space. Given N points {xj} in Rd, the algorithm attempts to find k nearest neighbors for each of xj , where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d ·(log d)+ k · (d + log k) · (log N)) + N · k2 · (d + log k), with T the number of iterations performed. The memory requirements of the proc...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملQuantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning
We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...
متن کاملEFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملFast Large-Scale Approximate Graph Construction for NLP
Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorith...
متن کامل